---
title: "MultiFileConverter"
id: multifileconverter
slug: "/multifileconverter"
description: "Converts CSV, DOCX, HTML, JSON, MD, PPTX, PDF, TXT, and XSLX files to documents."
---

# MultiFileConverter

Converts CSV, DOCX, HTML, JSON, MD, PPTX, PDF, TXT, and XSLX files to documents.

<div className="key-value-table">

|  |  |
| --- | --- |
| **Most common position in a pipeline** | Before PreProcessors , or right at the beginning of an indexing pipeline |
| **Mandatory run variables** | `sources`: A list of file paths or ByteStream objects |
| **Output variables** | `documents`: A list of converted documents  <br /> <br />`unclassified`: A list of uncategorized file paths or byte streams |
| **API reference** | [Converters](/reference/converters-api) |
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/multi_file_converter.py |

</div>

## Overview

`MultiFileConverter` converts input files of various file types into documents.

It is a SuperComponent that combines a [`FileTypeRouter`](../routers/filetyperouter.mdx), nine converters and a [`DocumentJoiner`](../joiners/documentjoiner.mdx) into a single component.

### Parameters

To initialize `MultiFileConverter`, there are no mandatory parameters. Optionally, you can provide `encoding` and `json_content_key` parameters.

The `json_content_key` parameter lets you specify for the JSON files which key in the extracted data will be the document's content. The parameter is passed on to the underlying [`JSONConverter`](jsonconverter.mdx) component.

The `encoding` parameter lets you specify the default encoding of the TXT, CSV, and MD files. If you don't provide any value, the component uses `utf-8` by default. Note that if the encoding is specified in the metadata of an input ByteStream, it will override this parameter's setting. The parameter is passed on to the underlying [`TextFileToDocument`](textfiletodocument.mdx) and [`CSVToDocument`](csvtodocument.mdx) components.

## Usage

Install dependencies for all supported file types to use the `MultiFileConverter`:

```shell
pip install pypdf markdown-it-py  mdit_plain trafilatura python-pptx python-docx jq openpyxl tabulate pandas
```

### On its own

```python
from haystack.components.converters import MultiFileConverter

converter = MultiFileConverter()
converter.run(sources=["test.txt", "test.pdf"], meta={})
```

### In a pipeline

You can also use `MultiFileConverter` in your indexing pipeline.

```python
from haystack import Pipeline
from haystack.components.converters import MultiFileConverter
from haystack.components.preprocessors import DocumentPreprocessor
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

pipeline = Pipeline()
pipeline.add_component("converter", MultiFileConverter())
pipeline.add_component("preprocessor", DocumentPreprocessor())
pipeline.add_component("writer", DocumentWriter(document_store = document_store))
pipeline.connect("converter", "preprocessor")
pipeline.connect("preprocessor", "writer")

result = pipeline.run(data={"sources": ["test.txt", "test.pdf"]})

print(result)
## {'writer': {'documents_written': 3}}
```
